Ordinary Kriging (OK)
Contents
Ordinary Kriging (OK)#
TODO define ordinary Kriging
[1]:
import context
import salem
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
from pykrige.ok import OrdinaryKriging
import plotly.express as px
from datetime import datetime
from utils.utils import pixel2poly, plotvariogram
from context import data_dir
******************************
context imported. Front of path:
/Users/rodell/krige-smoke
/Users/rodell/krige-smoke/docs/source
******************************
through /Users/rodell/krige-smoke/docs/source/context.py -- pha
Open the reformated data with the linear, meter-based, Lambert projection (EPSG:3347). Again this is helpful as lat/lon coordinates are not good for measuring distances which is important for spatial interpolation.
[2]:
df = pd.read_csv(str(data_dir) + "/obs/gpm25.csv")
gpm25 = gpd.GeoDataFrame(
df,
crs="EPSG:4326",
geometry=gpd.points_from_xy(df["lon"], df["lat"]),
).to_crs("EPSG:3347")
gpm25["Easting"], gpm25["Northing"] = gpm25.geometry.x, gpm25.geometry.y
gpm25.head()
[2]:
| Unnamed: 0 | id | datetime | lat | lon | PM2.5 | geometry | Easting | Northing | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 42073 | 2021-07-16 22:00:00 | 47.185173 | -122.176855 | 0.862 | POINT (3972238.901 1767531.888) | 3.972239e+06 | 1.767532e+06 |
| 1 | 3 | 53069 | 2021-07-16 22:00:00 | 47.190197 | -122.177992 | 1.078 | POINT (3972419.863 1768071.699) | 3.972420e+06 | 1.768072e+06 |
| 2 | 12 | 10808 | 2021-07-16 22:00:00 | 40.507316 | -111.899188 | 9.780 | POINT (4460286.051 743178.640) | 4.460286e+06 | 7.431786e+05 |
| 3 | 16 | 85391 | 2021-07-16 22:00:00 | 48.454213 | -123.283643 | 0.874 | POINT (3964698.001 1931774.531) | 3.964698e+06 | 1.931775e+06 |
| 4 | 21 | 79095 | 2021-07-16 22:00:00 | 47.672130 | -122.514183 | 0.784 | POINT (3974631.739 1827689.201) | 3.974632e+06 | 1.827689e+06 |
Create Grid#
Here we will create a grid we want to use for the interpolation. - NOTE we will use salem to create a dataset with the grid. This will be more useful for the universal kriging when we reproject other gridded data to act as covariances for interpolation
[3]:
## define the desired grid resolution in meters
resolution = 20_000 # grid cell size in meters
## make grid based on dataset bounds and resolution
gridx = np.arange(gpm25.bounds.minx.min(), gpm25.bounds.maxx.max(), resolution)
gridy = np.arange(gpm25.bounds.miny.min(), gpm25.bounds.maxy.max(), resolution)
## use salem to create a dataset with the grid.
krig_ds = salem.Grid(
nxny=(len(gridx), len(gridy)),
dxdy=(resolution, resolution),
x0y0=(gpm25.bounds.minx.min(), gpm25.bounds.miny.min()),
proj="epsg:3347",
pixel_ref="corner",
).to_dataset()
## print dataset
krig_ds
[3]:
<xarray.Dataset>
Dimensions: (x: 145, y: 122)
Coordinates:
* x (x) float64 3.46e+06 3.48e+06 3.5e+06 ... 6.3e+06 6.32e+06 6.34e+06
* y (y) float64 4.984e+05 5.184e+05 5.384e+05 ... 2.898e+06 2.918e+06
Data variables:
*empty*
Attributes:
pyproj_srs: +proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=4...Setup OK#
[4]:
nlags = 15
variogram_model = "spherical"
startTime = datetime.now()
krig = OrdinaryKriging(
x=gpm25["Easting"],
y=gpm25["Northing"],
z=gpm25["PM2.5"],
variogram_model=variogram_model,
enable_statistics=True,
nlags=nlags,
)
print(f"OK build time {datetime.now() - startTime}")
OK build time 0:03:03.487977
Variogram#
Graphical representation of spatial autocorrelation.
Shows a fundamental principle of geography: closer things are more alike than things farther apart
Its created by calculating the difference squared between the values of the paired locations
paired locations are binned by the distance apart
An empirical model is fitted to the binned (paired locations) to describe the likeness of data at a distance.
Type of empirical models
Circular
Spherical
Exponential
Gaussian
Linear
The fitted model is applied in the interpolation process by forming (kriging) weights for the predicted areas.
Three parameters that define a variogram..
sill: the total variance where the empirical model levels off,
is the sum of the nugget plus the sills of each nested structure.
(effective) range: The distance after which data are no longer correlated.
About the distance where the variogram levels off to the sill.
nugget: Related to the amount of short range variability in the data.
Choose a value for the best fit with the first few empirical variogram points.
A nugget that’s large relative to the sill is problematic and could indicate too much noise and not enough spatial correlation.
variogram statistics#
A good model should result in - Q1 close to zero, - Q2 close to one, and - cR as small as possible. TODO define above stats variables.
[5]:
plotvariogram(krig)
Execute OK#
Interpolate data to our grid using OK.
[6]:
startTime = datetime.now()
z, ss = krig.execute("grid", gridx, gridy)
print(f"OK execution time {datetime.now() - startTime}")
OK_pm25 = np.where(z < 0, 0, z)
# krig_ds["OK_pm25"] = (("y", "x"), OK_pm25)
OK execution time 0:00:05.391558
Plot OK#
Convert data to polygons to be plot-able on a slippy mapbox. This is not necessary but but :)
[7]:
polygons, values = pixel2poly(gridx, gridy, OK_pm25, resolution)
pm25_model = gpd.GeoDataFrame(
{"Modelled PM2.5": values}, geometry=polygons, crs="EPSG:3347"
).to_crs("EPSG:4326")
fig = px.choropleth_mapbox(
pm25_model,
geojson=pm25_model.geometry,
locations=pm25_model.index,
color="Modelled PM2.5",
color_continuous_scale="jet",
center={"lat": 50.0, "lon": -110.0},
zoom=2.5,
mapbox_style="carto-positron",
opacity=0.8,
)
fig.update_layout(margin=dict(l=0, r=0, t=30, b=10))
fig.update_traces(marker_line_width=0)